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matrix  can  be  as  high  as  511  and  the  case  where  the  rank  is  between 
512  and  1023.  Finally,  we  utilize  STARAN  timing  data  to  develop  plots 
of  total  time  to  invert  the  matrix  versus  the  rank  of  the  matrix. 


Rectangular  Coordinates 

Recall  from  reference  1 that  inverting  a matrix  of  rank  N requires 
N divisions,  N(N-l)  multiplications  and  N(N-l)  subtractions. 

Suppose  we  have  a set  of  complex  numbers  x^  + iy^  for  j = 1,  2,.,.,N 
and  would  like  to  divide  these  numbers  in  parallel  by  one  complex  number 
from  within  the  set.  Call  the  divisor  x + iy.  Then,  in  order  to 
perform  the  division,  we  must  form  the  complex  conjugate  and  multiply 
as  follows: 


(x^  + iy^)  (x  - iy) 
U + iy)  (x  - iy) 


xix_  +-yZ  + t -Y 
2 2 1 2 2 
x + y x + y 


If  we  assume  that  the  x^  are  in  one  column  of  the  array  and  the  y^  are  in 
another  column  then  it  will  require  four  comparand  to  field  multiplies 
to  form  the  following  products: 


XjX,  xy^,  y^y  and  x^y. 
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We  can  then  perform  one  addition  to  form 


and  one  subtraction  fo  form 


We  then  must  divide  each  of  the  above  by 


Note  that  this  sum  is  already  available  since  x and  y are  contained  in 


the  set  of  numbers.  We  thus  obtain  the  time  for  division  of  one  column 


of  complex  numbers  by  a single  complex  number  to  be 


where  t is  the  comparand  to  field  multiply  time,  t is  the  field  to 


field  add  time,  t is  the  field  to  field  subtract  time  and  t 


comparand  to  field  divide  time.  If  we  use  STARAN  time  values  we  obtain 


r 


T__  =*  4(715)  + 412  + 412  + 2(772)  - 5228  psec. 
DK 


In  all  of  the  calculations  using  STARAN  timing  values  we  will 
assume  that  the  floating  point  arithmetic  capability  is  implemented  in 
software  and  that  we  are  operating  with  32  bit  field  lengths.  Further- 
more, it  should  be  pointed  out  that  the  time  to  perform  data  movement 
and  other  non-arithmetic  operations  has  been  omitted. 

Using  similar  assumptions  consider  the  multiplication  of  a column 
of  complex  numbers  by  a complex  number: 


(Xj  + iy^)  (x  + iy)  = (x^x  - y^y)  + i(x^y  + xy^). 


In  this  case  we  have  four  comparand  to  field  multiplies  to  form  the  cross 
products,  one  addition  and  one  subtraction.  We  thus  obtain: 


MR 


4 t + t + t . 
me  a s 


For  STARAN  time  values  we  obtain 


T„_  * 4(715)  + 412  + 412  = 3684  Psec. 
MR 


i 


Subtraction  is  by  far  the  least  time  consuming.  For  a field  to 
field  subtraction  we  have: 


Cxj  + iy.)  - (xR  + iyk)  = (x.  - xk)  + i(y.  - yR) 


which  results  in  only  two  subtractions  or 


T__  = 2 t . 
SR  s 


For  STARAN  time  values  we  obtain: 


T = 2(412)  = 824  usee. 
bK 


If  we  now  consider  N divisions,  N(N-l)  multiplications  and  N(N-l) 
subtractions  we  obtain  the  total  arithmetic  time  as: 


tr  ■ '"dr  + S(N-1)  t«r  + N(',-1)  tsr- 


For  STARAN  timing  values  this  reduces  to: 


T_  = N(5228  + (N-l)  (3684  + 824)) 
K 

= 4508N^  + 720N  usee. 
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Polar  Coordinates 

Now  suppose  the  complex  numbers  a:*e  In  polar  coordinates  such  that 
191 

we  have  numbers  r^e  J for  j - 1,2 N and  would  like  to  divide  and 

multiply  these  numbers  by  another  complex  number  re16  which  is  again 
a member  of  the  original  set.  Also,  we  would  like  to  subtract  two 
columns  of  complex  numbers. 

Considering  division  we  have: 


Again,  assuming  that  the  r^  and  0^  are  in  two  columns  of  the  array  the 
f.iwe  it  would  take  to  perform  a division  of  complex  numbers  is  determined 
by  one  comparand  to  field  divide  and  one  comparand  from  field  subtraction. 
Thus,  we  obtain 


or  for  STARAN 


772  + 412  m 1184  Psec. 


for  multiplication  we  must  obtain 


i0,  ,0  i(0 , + 6) 

(r.e  J)  (re10)  = r^e  J 


or  one  comparand  to  field  multiply  and  one  comparand  to  field  addition. 
We  thus  obtain 


= t + t 
MP  me  ac 


or  for  STARAN  = 715  + 412  = 1127  psec. 

Mr 

While  there  appears  to  be  considerable  time  advantage  in  using 
polar  coordinates  for  division  and  multiplication  the  advantage  disappears 
when  subtraction  is  considered.  Consider  the  difference  between  the 
following  columns  of  numbers: 


I6j  lek 

rje  - rke 


In  order  to  take  this  difference  we  must  first  transform  these  numbers 
to  their  equivalent  sine  and  cosine  representations.  Thus,  we  obtain 
the  following: 


and 


r^e  J “ rj  (cos  ®j  + i sin  6^) 


i9k 

rke  * rfe  (cos  0k  + i sin  0fc) 


i61  i6k 

rjG  “ rke  = ^rj  cos  6j  " rk  C0S  9k^ 


+ i(r  sin  0 - r,  sin  6,  ) 

J j k k 


In  order  to  perform  these  calculations  we  must  perform  two  cosine  functions, 
two  sine  functions,  four  multiplies  and  two  subtractions.  Now,  given 
these  results  we  must  transfer  back  to  our  previous  polar  coordinate 
form.  For  ease  of  notation  suppose  we  let  X and  Y be  the  following: 

X + iY  = (r.  cos  0.  - r,  cos  0,  ) 

1 j k k 

+ i(r,  sin  0 - r,  sin  9,). 

j j k k 

The  result  that  we  are  seeking  is  the  following  in  polar  coordinates: 

(V7T?;  e1  ta"‘l<I>) 


In  order  to  perform  this  operation  we  must  reproduce  X and  Y, 

2 2 

multiply  them  together  to  get  X and  Y , take  the  square  root,  divide 
Y by  X and  take  an  arctan.  When  we  put  all  of  these  operations  to- 


gether we  have  the  following  equation  for  the  subtraction  of  two 
columns  in  polar  coordinates: 


2 t + 2 t . + 6 t + 2 t + t , + t+  t _ + t . 

cos  sin  m s d a arctan  sqrt 


where  t is  the  time  to  take  the  cosine  of  a column  of  numbers  in  the 
cos 

array:  t . is  the  time  to  take  the  sine;  t , t , t and  t , are  the 
sin  a ’s’  m d 

field  to  field  add,  subtract,  multiply  and  divide  times  respectively; 
t is  the  time  required  to  take  the  arctangent  of  a column  of 

ax  C Call 

numbers;  and  tg^rt  is  the  square  root  time.  The  time  to  reproduce  the 
X and  Y columns  has  been  omitted  since  it  is  small  in  comparison  to  the 
other  times. 

If  we  utilize  STARAN  timing  values  we  obtain  the  following  time 


Tgp  - 2(12363)  + 2(12363)  + 6(832)  + 2(412) 


+ 890  + 412  + 12363  + 652  = 69,585  Usee. 
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Now,  if  we  add  together  the  division,  multiplication  and  subtraction 


times  we  obtain: 


Tp  - ntdp  + N(N-1)  tmp  + N(N“1)  TSP- 


For  STARAN  timing  values  this  reduces  to 


Tp  = N(1184  + (N-l)  (1127  + 69,585)) 


70712  N - 69528  N psec 


If  we  again  consider  a matrix  of  rank  = 500  we  obtain  T ■ 17643  sec. 

P 

It  is  apparent  that  the  large  time  required  for  the  inversion 
process  when  polar  coordinates  are  used  is  mainly  due  to  the  underlying 
subtractioa  processes  that  are  involved.  The  subtraction  process  re- 
quires the  use  of  the  sine  and  cosine  functions  and  these  in  turn  re- 
quire enough  execution  time  to  more  than  offset  the  savings  that  are 
made  during  the  division  and  multiplication  processes. 


W'.'XI 

■ ' m 


Coordinate  Conversions 

To  complete  our  discussion  of  which  complex  number  representation 
scheme  is  most  suitable  for  matrix  inversion  purposes,  let  us  now 
consider  the  two  other  cases  that  are  possible,  viz:  start  with  the 
matrix  elements  represented  in  rectangular  coordinates  and  go  ahead  with 
the  inversion  process,  converting  to  polar  representation  and  back  when 
deemed  desirable,  and  vice  versa.  We  now  consider  how  these  schemes 
perform  in  terms  of  inversion  times. 

From  the  discussion  of  the  subtraction  process  involved  with  polar 
coordinates,  it  is  clear  that  conversion  of  a set  of  numbers  in  rectangular 
coordinates  to  polar  coordinates,  i.e., 

(x,y) * (Vx2  + y^,  tan  1 

requires  an  amount  of  time  equal  to: 


T - 

RP 


2 t + t.  + t + t _ + L 

m d a arctan  sqrt 


For  STARAN  timing  values  this  reduces  to: 


- 2(832)  + 890  + 412  + 12363  + 652 
“ 15981  p secs. 
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Also,  conversion  of  a set  of  numbers  from  polar  coordinates  to 
rectangular  coordinates,  i.e.. 


(r,0) *(r  cos  0,  r sin  0) 


would  require  an  amount  of  time  equal  to: 


TPR  = Ccos  + tsin  + 2 tm* 


For  STARAN  timing  values  this  reduces  to: 


TpR  = 2(12363)  + 2(832) 


26390  Usees. 


If  we  represent  the  results  obtained  above  in  tabular  form,  as 
shown  in  Table  1,  we  find  that  working  with  rectangular  coordinates 
clearly  outperforms  the  case  tfhere  polar  coordinates  are  used.  Although 
the  use  of  polar  coordinates  results  in  some  savings  in  time  for  division 
and  multiplication  operations,  these  savings  are  more  than  offset  during 
the  subtraction  process,  because  of  the  large  amounts  of  time  required 
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TABLE  Is  Execution  Times  (usee)  vs.  Coordinate 

Representation.  (STARAN  time  values  used) 


Rectangular  Coordinates 

Polar  Coordinates 

Division 

5228 

1184 

Multiplication 

3684 

1127 

Subtraction 

824 

69585 

Matrix  Inversion 

4508  N2  + 720  N 

70712  N2  - 69528  N 

Conversion  to 

26390 

15981 

Inversion  of  Complex  Matrices  Using  an  Associative  Processor  (AP) . 

The  process  of  inverting  matrices  with  real  elements  on  an  AP 

was  discussed  in  [lj.  In  principle,  we  can  use  the  same  Gauss-Jordan 
method  for  inverting  matrices  with  complex  elements,  except  that  the 
algorithms  have  to  be  modified  to  reflect  the  fact  that  the  arithmetic 
operations  have  complex  numbers  as  their  arguments,  etc.  As  discussed 
in  [1],  for  reasons  of  storage  efficiency,  we  want  to  start  off  with 
only  the  first  row  of  the  identity  matrix  appended  to  the  top  of  the 
given  matrix,  instead  of  appending  the  entire  identity  matrix  to  the 
given  matrix.  We  have  already  made  the  assumption  that  the  matrix 
elements  are  available  to  us  in  rectangular  coordinates;  we  now  make 
the  additional  assumption  that  the  first  row  of  the  identity  matrix 
has  been  appended  to  the  top  of  the  given  matrix  in  mass  storage,  so 
that  we  shall  consider  the  matrix  (Z) , as  it  resides  in  mass  storage, 
to  be  (N  + 1)  rows  by  N columns. 

Shown  in  Figure  1 is  the  solution  of  a 3 x 3 matrix  with  complex 
numbers.  The  technique  that  is  utilized  to  determine  the  inverse  of 
this  matrix  on  an  AP  will  be  illustrated  in  some  detail  subsequently. 


V 


1 s 


1 + il 


1 + il 


-2-12 


3 + i3 


2 + i2  -3-13 


1 + il 


1 + il 


We  would  then  start  initially  with: 


1 + iO 


0 + iO 


0 + iO  First  row  of  identity  matrix. 


1 + il 


0 + 10 


1 + il 


-2  - i2 


2 + i2 


1 + il 


3 + i3 


-3  - 13 


1 + il 


On  successive  iterations  of  the  inversion  process  we  obtain: 


0.5  - 10.5 


1 + 10 


0 + 10 


1 + 10 


0 + iO 


-2  - 12 


2 + i2 


1 + 12 


0 + 10  After  dividing  column  1 
3+13  (the  pivot)  by  1 + il. 


-3  - 13 


1 + il 


0.5  - 10.5 


1 + 10 


0 + 10 


1 + 10 


2 + iO 


0 + iO 


2 + 12 


3 + i3 


-3  + iO  After  performing  elementary 
0 + iO  column  operations  to 


-3  - 13]  normalize  columns  2 and  3 


-2  - 12  with  respect  to  the  pivot. 


FIGURE  1:  Inversion  of  a 3 x 3 Matrix  With  Complex  Elements. 
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13 
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FIGURE  1;  Inversion  of  a 3 x 3 Matrix  with  Complex  Elements 


(Continued) 


Third  column  becomes 


the  pivot 


FIGURE  1{  Inversion  of  a 3 x 3 Matrix  With  Complex  Elements 


(Continued) 


For  the  general  case,  let  the  matrix,  Z *>e  represented  as: 


Z - [Z^  Z2  ...  2^] 

- [*1  + 1 yx  x2  + i y2  ...  5^  + i yNl, 


where,  Z is  the  (N+l)  x N matrix  with  complex  elements, 


Z is  the  nth  column  of  Z, 
n 


x are  the  real  components  of  Z , 


y^  are  the  imaginary  components  of  2^, 


n 1,...,N« 


We  can  thus  think  of  the  matrix  elements  as  constituting  2N  column 


vectors  (x  and  y ) of  length  (N+l)  each.  We  now  make  a third  assumption, 
n n 


viz.  the  vectors  x^  and  y^,  can  be  individually  retrieved  from  the  mass 


storage  when  desired  for  loading  onto  the  associative  arrays.  This  is 


tantamount  to  saying  that  we  can  address  the  vectors  x md  y individually 

n n 


for  moving  them  to  and  from  the  mass  storage  and  the  associative  arrays. 


such  that  we  can  directly  load  x^  and  y^  into  two  separate  fields  of  an 


associative  array,  or,  in  the  same  field,  one  below  the  other. 


r 


Before  beginning  the  discussion  of  the  inversion  algorithm,  we  must 

introduce  some  additional  notation.  Let 

W = Number  of  words  in  an  associative  array 

A = Number  of  associative  arrays 

| WA  , . . _ WA 

L 'BoSl)  " lar8est  lntege,:  lnTo5iT 

= Number  of  (x^,  yR)  vector  pairs  (corresponding  to  matrix 
columns)  that  can  be  loaded  in  an  AP  field, 

M = L |_(N-1)  “ the  smaller  of  L and  (N-l) , 

K = 

Proceeding  with  the  inversion  process  as  in  [1],  we  want  to  bring 
the  first  column  ( real  and  imaginary)  of  the  matrix  into  one  field  of 
the  AP  and  then  process  each  of  the  remaining (N-l)  columns  of  the  matrix 
relative  to  the  first  column,  according  to  the  Gauss-Jordan  Elimination 
Method.  We  next  use  the  second  column  of  the  partly  processed  matrix  as 
the  Divot  and  process  the  other  (N-l)  columns  relative  to  it,  and  so  on. 
In  order  to  process  as  many  columns  of  the  matrix  in  parallel  against  a 
pivot  as  we  can,  we  must  load  as  many  columns  of  the  matrix  as  possible 
into  the  same  field  of  the  AP.  A maximum  of  L matrix  columns  (for  N x N 
matrices  with  complex  elements,  and  with  the  first  row  of  the  identity 
matrix  appended  to  their  top)  can  be  loaded  in  an  AP  field.  However,  if 


P 


= smallest  integer  greater  than  or  equal  to 


(N-l) 

M 


(N-l)  is  less  than  L,  we  need  to  load  only  those  (N-l)  columns  in  the 
associative  array  field  for  processing  against  the  pivot.  M is  thus 
the  number  of  matrix  columns  to  be  actually  loaded  into  an  AP  field 
for  processing  against  the  pivot.  After  these  M columns  have  been 
processed,  we  get  and  process  M more  columns,  if  there  remain  that  many, 
and  so  on.  A total  of  K iterations  would  thus  be  required  to  process  the 
(N-l)  columns  relative  to  the  given  pivot. 

Unless  otherwise  stated  we  will  work  with  the  assumption  that  the 

AP  has  1024  words  (viz.  the  RADC  STARAN  with  4 arrays).  It  is  evident 

then  that  as  long  as  the  rank  of  the  matrix  is  less  than  or  equal  to  511, 

WA 

(or  more  generally  if  N < ) one  column  (or  more)  of  the  matrix  can 

entirely  fit  into  a field  of  the  AP.  However,  if  the  rank  of  the  matrix 
is  512  or  more,  not  even  one  entire  column  can  fit  into  a field.  Still, 
if  the  rank  of  the  matrix  happens  to  be  1023  or  less  we  can  overcome  this 
problem  somewhat  by  loading  the  real  components  of  the  column  in  one  field 
and  the  imaginary  components  in  another  and  proceeding  with  the  arithmetic 
operations  accordingly,  though  this  results  in  degradation  of  performance 
over  the  case  where  the  rank  of  the  matrix  is  511  or  less. 
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We  define 


This  case  is  concerned  with  matrices  in  which  N < — j . 

Fj  as  field  j of  the  AP  where  j is  an  integer.  In  this  report  Fj  will 

normally  be  32  bits  in  length  so  that  there  will  be  eight  possible  fields 

or  j = 1,...,8.  Let  Fj  denote  word  m of  field  j where  0 m^  WA  - 1. 

m 

It  should  be  noted  that  in  some  places  in  the  algorithms  that  follow,  m * n 
and  n is  used  for  convenience.  We  need  a method  of  referring  to  columns 

of  data  N + 1 words  long.  We  will  utilize  the  index  k to  denote  a 

column  of  data.  Thus,  Fj^  denotes  words  (k-l)« (N+l)  thru [(k-1)* (N+l)  + N] 
of  field  j.  (i.e.  the  N + l words  of  field  j starting  with  word  (k-1) • (N+l)). 
Let  RM  denote  the  word  mask  register  and  CR  the  common  register. 

Also,  let  denote  the  assignment  statement,  such  that  the  argument 
on  the  right  of  the  arrow  is  assigned  to  be  the  content  of  the  argument 
on  the  left  of  the  arrow;  let1^1  represent  the  same  meaning  as  ' except 
that  the  assignment  operation  involves  loading  the  arrays  from  the  mass 

i t 

memory;  and  let  zzjr  denote  an  output  operation  from  the  arrays  to  the  mass 
memory. 

The  word  mask  register  can  be  thought  of  as  a 1-bit  wide  field.  With 

this  in  mind,  we  can  define  RM  and  RM  to  have  the  same  meaning  as  in 


the  case  of  fields. 


We  shall  also  make  the  assumption  that  we  have  a total  of  at  most 


seven  working  fields  available  to  us  (This  would  correspond  to  the  STARAN 


array  memory  when  32-bit  fields  are  used;  the  STARAN  requires  one  field 


to  be  left  aside  for  use  by  the  machine) 


The  algorithm  for  inverting  matrices  belonging  to  class  A on  an 


AP  follows.  Memory  maps  of  the  AP  at  various  stages  of  execution  of 


the  algorithm  are  shown  in  Figure  2 . The  circled  letters  within  the  memory 


maps  are  used  as  keys  to  the  corresponding  sets  of  statements  within  the 


algorithm 


Algorithm  1 


Comment:  In  Aj columns  of  data  are  loaded  into  field  FI  of  the  arrays 


(All  words  in  the  array  are  active) 


(n  and  k here  are  counters  (registers)  available  to  the  control  unit) 


Fields 


Figure  %.  - Continued 
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Figure  2.  - Continued 


A 


Comment:  The  steps  in  E and  F cause  the  following  to  be  performed:  multiply 

the  reals  by  a minus  and  add  F4  and  F5. 


18.  RM1,3,,***k"2<— 0 

19.  F5«r-  - F5 


(20.  RM^-1 

[21.  F6<-F4  + 


Comment:  The  steps  in  G complete  the  normalization  of  the  pivot  element 

to  one. 


• 

CM 

.f'* 

CR< — F6 

n 

23. 

FI* — F6  ♦ 

CR 

24. 

F6«r—  FI 

25. 

CR* — F6 

n 

26. 

CR< — CR  - 

1 

27. 

F6  <—  CR 

Comment:  The  n word  of  field  6,  which  has  just  been  reduced  to  1,  is 
now  replaced  with  a zero,  prior  to  loading  the  contents  of  the 
field  on  ■ the  mass  memory  because  the  corresponding  matrix 
column  will  no  longer  act  as  a pivot  in  future  iterations. 


28 


28.  F61 

29.  ^ F62 


Comment:  The  n on  top  of  '4'  implies  that  the  vectors  being  read  out 

correspond  to  the  n*"^1  column  of  the  matrix. 


Comment:  The  steps  in  H place  the  new  and  5?  in  F2  with  on  top. 

'n  n ■'n  r 

(These  are  the  same  vectors  that  were  just  previously  read  out. 
Doing  this  operation  in  this  way  conserves  time.) 


\ 


3Q,  k<—  1 

(3l)  F2^=j 


32.  k£—  k + 1 

33.  F2%=  ? 

n 

34.  kv— k + 1 

25,  Go  to  (£)  if  (k  < M) 

36.  nl«-  1 

37.  n24 — 1 

(nl  and  n2  are  counters) 


Comment:  The  steps  in  I bring  in  additional  columns  of  the  matrix  to  F3. 
In  Figure  2 we  show  the  first  two  columns  of  the  matrix. 


Eventually,  all  columns  would  be  brought  in  with  the  exception 
of  the  n^*1  column. 

k <-  1 

Go  to(^^if  Cnl  = n) 

P3^ 

k <—  k + 1 

F3“«-  F„i 

k*-k  + 1 

nl«—  nl  + 1 

Go  to  (39^  if  {k  < M)  and  (nl  < N)} 
l *-l 

RH  <—  0 except  RM*’  ^ + 1 
(JL  is  a counter) 

In  steps  J and  K the  columns  are  multiplied  by  the  proper  value(s) 
in  preparation  for  the  subtraction  process. 

CR  «-F3(Jl-1)  (N+1)  + n 
FA-*- FI  x CR 

30 


Comment:  The  steps  in  I bring  in  additional  columns  of  the  matrix  to  F3. 
In  Figure  2 we  show  the  first  two  columns  of  the  matrix. 
Eventually,  all  columns  would  be  brought  in  with  the  exception 
of  the  n*"*1  column. 


f® 

k <-  1 

<8) 

Go  to  (44) if  (nl  = n) 

40. 

k«—  k + 1 

• 

<r 

F3k<- 

43. 

k<i — k + 1 

© 

nl«-  nl  + 1 

_«• 

Go  to  (39^  if  Ck  < M)  and  (nl  < N)} 

46. 

l *-l 

© 

RM  *-  0 except  RM**  * + 3 <— 1 

Ci  is  a counter) 

Comment:  In  steps  J and  K the  columns  are  multiplied  by  the  proper  value(s) 

in  preparation  for  the  subtraction  process. 


CR  <-F3()t-l)  (N+l)  + n 
F4  4-F1  x CR 


30 


50 


. CR<  F3a(n+1)  + n 

51.  F5  <—  F2  x CR 

52.  l + 2 

53.  Go  to  (47)  if  < k) 
54  . RM  <-  1 


55,  RM2,4’*”‘k-1<- 0 

56.  F5<-  - F5 


Comment:  In  L and  M the  processing  of  x^,  y^  and  x^,  is  completed. 

The  rest  of  the  algorithm  deals  with  processing  the  rest  of  the 
columns  and  stopping  conditions. 


57. 

RM  <- 1 

.58. 

F6<-  F4  + F5 

{59. 

F5  F3  - F6 

60. 

lf-1 

© 

Go  to  (6?)  if  (n2  ^ n) 

62. 

F5(*-1)*(N+1)  + n 

63. 

CR  *-  CR  + 1 

64. 

F5(i-1)*  (N+l)  + n*~  CR 

© 

^4  F5* 

66. 

l*r~  l + 1 

31 
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67. 

,%  F5* 

68. 

t*-l  + 1 

69. 

n2< — n2  + 1 

70. 

Go  to  (6?)  if 

(Jl  < k)  and 

71. 

Go  to  (38)  if 

(nl  < N) 

72. 

n«—  n + 1 

73. 

Go  to  (?)  if 

(n  < N) 

74. 

STOP 

We  < 

can  now  use  the 

matrix  A of 

1 be 

used  to  invert 

this  matrix 

— > 
X 

n 


n = 1,...N,  corresponding  to  A are  shown  in  Figure  3.  Shown  in  Figures  4.1 
to  4.4  are  the  AP  memory  maps  that  would  result  during  a partial  inversion 
of  the  above  matrix  using  algorithm  1.  Circled  letters  within  the  Figures 
are  again  used  as  keys  to  the  corresponding  sets  of  statements  within  the 
algorithm. 


Initially,  vectors  x^  and  y^  are  loaded  into  fields  FI  and  F2  of  the 


AP  as  shown  in  Figure  4.1.  This  corresponds  to  the  first  column  of  the 
matrix  as  being  the  pivot.  The  process  of  division  of  this  pivot  by  the 
first  diagonal  element  of  the  matrix  is  contained  in  Figures  4.1  and  4.2. 


1 


12 


I 


During  Inversion  of  the  A Matrix 


( 


Once  the  pivot  has  been  so  'normalized*,  columns  2 and  3 of  the  matrix 
would  be  ready  to  be  processed  against  it  and  as  such  these  columns 
are  brought  into  the  AP  (in  field  F3)  as  shown  in  Figure  4.3.  The 
processing  of  columns  2 and  3 of  the  matrix  against  the  pivot  is  shown 
in  Figures  4.3  and  4.4. 

The  rest  of  the  inversion  process  would  be  identical  to  that  shown 
in  Figures  4.1  to  4.4,  except  that  we  would  next  use  column  2 of  the 
matrix  as  the  pivot,  'normalize  it',  process  the  rest  of  the  matrix 
columns  against  it,  and  finally  do  the  same  with  column  3. 


P 


r 


Case  B 


Matrices  with  ~ ■$  N ^ (WA  - 1) , 


When  the  rank  of  the  matrix  is  greater  than  or  equal  to  — j, 
a complete  column  of  the  matrix  (both  the  x^  and  vectors)  can  no 
longer  be  fit  into  one  field  of  the  AP.  For  these  matrices,  the 
inversion  algorithm  has  thus  to  be  revised  accordingly.  Note  that 
since  N is  less  than  WA,  we  can  still  go  about  our  business,  albeit 
somewhat  inefficiently,  by  splitting  the  matrix  column  in  two  fields: 
the  real  part  in  one  field  and  the  imaginary  part  in  another.  However, 
for  matrices  with  rank  greater  than  or  equal  to  WA,  there  would  be  a 
much  greater  than  proportional  decrease  in  efficiency  since  a matrix 
column  would  have  to  be  spread  over  more  than  two  fields  of  the  array, 
i.e.,  it  would  have  to  be  processed  in  parts. 

The  algorithm  for  inverting  case  B matrices  on  an  AP  follows.  The 
memory  maps  at  various  stages  of  execution  of  the  algorithm  are 
illustrated  in  Figure  5.  Keys  are  provided  as  before  from  Figure  5 to 
the  corresponding  steps  in  the  algorithm. 


Algorithm  2 


0.  RM  <- 1 

1.  n<—  1 

A {©  FI*  xn 
b (3. 

(u.  CR  <-  FI 

C 1 " 

5.  F3<—  FI  x CR 


Figure  5.  - Oor * xnued 


gp^vv  w 
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Q. 

R 

S 

T 


- 29. 

F7<— F5  - F6 

• 30. 

F6< — F3  - F7 

31. 

Go  to  j 

32. 

CR*—  F6 

n 

33. 

CR< — CR  + 1 

34. 

F6  * — CR 
rr 

<B> 

&>F6 

-36. 

F5<r-Fl  x CR 

'37. 

CRf-F3 

n 

38. 

F6< — F2  x CR 

f39. 

F7< — F5  + F6 

-jo. 

F6V-F4  - F7 

41. 

>F6 

0 

nl^_  nl  + 1 

43. 

Go  to  (2$\  if  ( 

44. 

n« — n + 1 

45. 

Go  to(2^if  (n 

46. 

STOP. 
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After  complete  execution  of  the  algorithm  we  would  end  up  with  a 
(N+l)  x N matrix  in  mass  storage,  the  first  N rows  of  which  are  the 
desired  inverse  (the  last  row  would  correspond  to  the  last  row  of  the 
identity  matrix).  The  algorithm  described  above  requires  the  fewest 
number  of  steps  of  all  the  various  other  possibilities  that  were 


considered 


* m umn 


1 - •" 


1 


Development  of  Timing  Expressions 


In  the  development  of  the  timing  expressions  we  shall  consider  all 
the  AP  operations  Involved  In  the  algorithms  viz:  arithmetic  operations 
and  setting  of  the  mask  register  and  common  register,  as  well  as  array 
loading  and  unloading  times.  We  shall,  however,  neglect  the  times  to 
search  for  and  prepare  the  data  in  mass  memory  prior  to  loading.  We 
shall  also  neglect  the  times  for  setting  counters  in  the  AP  control  unit 
(as  in  steps  1 and  2 of  the  algorithm),  which  are  negligible  compared 
to  the  other  times. 

Shown  in  Table  2 are  memory  and  arithmetic  operation  times  for  the 
various  operations  that  are  used  in  the  previously  developed  algorithms. 
These  times  are  based  on  the  RADC  STARAN  computer. 

The  timing  expression  for  executing  the  inversion  algorithm  can  be 
derived  by  adding  the  times  required  for  the  individual  steps,  in  the 
algorithm,  of  course  taking  into  account  all  the  loops  that  exist.  This 
ads  to  the  following  expression  for  Case  A (M  has  the  same  meaning  as 
defined  on  page  20) : 


Tima. 


N[Mx32xt1  + Mx32xt1 

+ t,  + t + t,  + t + t +t  +t  + t + t.  + t . 
T.c  m ic  m tin  ram  rm  a T.c  d 

-*-t  + t,  +t  , + t + 2 x 32  x t. 

nc  lc  crd  wc  1 

+ M x 32  x t, 


+ 2 (N-l)  x 32  x t, 


+ (N-l)  (t  + 
rm 


t,  +t+t,  + t ) + k (t  + t + t +t+t) 

T.c  m lc  m rm  nm  rm  a s 

+ 2 (N-l)  x 32  x 

+ t.  + t . + t ], 
lc  cri  wc 


V 

\ 
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0.3  usees  . 

.64 

usees. 

.26 

usees. 

715 

usees. 

772 

usees. 

412 

usees. 

66n  + 

3.74usecs 

412 

usees. 

60n  + 

,51usecs. 

.25 

usees. 

.25 

usees. 

.86 

usees. 

*n  denotes  the  field  width;  for  purposes  of  this  analysis,  n 


= time  to  load  a bit-slice 
= time  to  load  CR  from  AM  word 
= time  to  set  M response  store 
= time  to  multiply  field  by  CR  - FL  - SP 
= time  to  divide  field  by  CR  - FL  - SP 

= time  to  add  field  to  field  - FL  - SP 

= time  to  move  neg  of  a field 
= time  to  subtract  field  firm  field  - FL  - SP 
= time  to  copy  a field  of  width  n 
= timr  to  decrement  CR 

= time  to  increment  CR 

= time  to  store  CR  into  AM  vord 


TABLE  2:  Memory  and  Arithmetic  Operation  Times  for  STARAN 


Times* 


OPERATION 


Which  simplifies  to: 


TlaeA  " t,[  (3M  + 2)  x 32t.  + t 

i nc 


+ 5tlc  + 2t.m  + + 2tm  + ta  + + t_  + t_,  + 


m 


crd 


cri 


2t 


+ (N-l)  (4  x 32  t.  + t + 2t,  + 2t  ) 

1 rm  lc  m' 

+ K(2t  + t + t + t ) ] . 
rm  ran  a s'  J 


wc 


For  STARAN,  the  above  timing  expression  reduces  to: 


TimeA  = N[28.8M  + 1469. 94N  + 849. 38K  + 1213.77]  usees. 

Summing  up  the  times  required  for  the  individual  steps  in  the 
algorithm  as  was  done  for  Case  A,  we  obtain  the  following  timing 
expression  for  inverting  matrices  belonging  to  Case  B: 

TimeB  - N[  2 x 321^  + 2(1^,  + 2tm)  + tg  + t&  + + 2td  + 2 x 321^ 

+ (N-l)  { 2 x 32tx  + 2(tlc  + tm)  + 2tg  + 32 

+ t + t,  + t + t + t + 32t.] 
m lc  m a s TJ 

+ t + t1+t.  + t + t.  + t .+t  ], 

nc  T.c  crd  wc  lc  cri  wc 


<51 


which  simplifies  to: 


TimeB 


N[  128^  + 5tlc  + 4tm  + ts  + ta  + 2td  + 

+ (N_1)  {128tl  + 3tlc  + 4tm+3ts  + ta)]. 


+ t -,  + t . + 
era  cn 


2t 


wc 


For  STARAN,  the  above  timing  expression  reduces  to: 


TimeB  = N[4548.32  N + 743. 21]  usees. 


Notice  that  for  a 1024  word  STARAN,  the  above  timing  figure  would 
correspond  to  matrices  with  512  £ N $ 1023. 

If  we  solve  the  two  equations  (TimeA  and  Timeg)  utilizing  various 
values  of  rank  N we  obtain  the  curves  for  complex  numbers  as  shown  in 
Figures  6 and  7.  The  curves  for  real  numbers  were  taken-  from  reference 
[1].  In  Figure  6 the  break  in  the  curve  occurs  when  we  can  no  longer  fit 
two  columns  (both  real  and  Imaginary)  of  the  matrix  into  the  same  field 
of  the  arrays.  In  Figure  7 the  break  occurs  at  N * 512  when  the  algo- 
rithms change  from  Case  A to  Case  B.  Shown  in  Tables  3 and  4 are  the 
supporting  data  values  for  these  curves.  Shown  in  Figure  8 and  Table  5 
are  similar  data  for  16  arrays. 


Complex  Numbers 


MATRIX 


INVERSION 


MINUTES 


Real  Numbers 


N - RANK  OF  MATRIX 


FIGURE  6.  Matrix  Inversion  Time  with  A Arrays  and  Rank  < 511 
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MATRIX 
INVERSION 
TIME  IN 
MINUTES 
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FIGURE  7.  Matrix  Inversion  Time  with  4 Arrays  and  Rank  < 1023 


TABLE  4:  Timing  Data  for  Inverting  a Matrix  with  Complex  Numbers 


(4  Arrays) 


TIME 

HRS. 

MINS. 

SECS. 

0 

0 

1 

0 

0 

4 

0 

0 

9 

0 

0 

17 

0 

0 

27 

0 

0 

40 

0 

0 

58 

0 

1 

16 

0 

1 

36 

0 

1 

59 

0 

2 

56 

0 

3 

29 

0 

4 

5 

0 

4 

44 

0 

5 

26 

0 

6 

11 

0 

6 

59 

0 

7 

50 

0 

8 

43 

0 

9 

40 

0 

10 

6 

0 

19 

53 

0 

20 

54 

Q 

22 

56 

0 

25 

4 

0 

27 

18 

0 

29 

37 

0 

32 

2 

0 

34 

33 

0 

37 

9 

0 

39 

51 

0 

42 

39 

0 

45 

32 

0 

48 

32 

0 

51 

36 

0 

54 

48 

0 

58 

3 

1 

1 

25 

1 

4 

52 

1 

8 

26 

1 

12 

4 

1 

15 

49 

1 

19 

21 

Complex  Numbers 


M 


MATRIX 
INVERSION 
TIME  IN 
MINUTES 


7&6URE  8,  Matrix  Inversion  Time  with  16  Arrays  and  Rank  < 2000. 

57 
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TABLE  5:  Timing  Data  for  Inverting  a Matrix  with  Complex 

Numbers  for  100  < Rank  < 2000.  (16  Arrays) 


TIME 

HRS . MINS . 


100 

0 

0 

200 

0 

1 

300 

0 

2 

400 

0 

4 

500 

0 

7 

600 

0 

10 

700 

0 

15 

800 

0 

20 

900 

0 

25 

1000 

0 

31 

1100 

0 

46 

1200 

0 

55 

1300 

1 

5 

1400 

1 

15 

1500 

1 

26 

1600 

1 

38 

1700 

1 

51 

1800 

2 

5 

1900 

2 

19 

2000 

2 

34 

CONCLUSIONS 


In  Table  6 are  shown  the  inversion  times,  in  minutes,  for  inverting 
both  complex-numbered  and  real-numbered  matrices  of  various  sizes  on  the 
Good year /STARAN.  Both  a STARAN  with  4 arrays  (1024  words)  and  one  with 
16  arrays  (4096  words)  are  considered.  The  inversion  times  for  matrices 
with  real  numbers  are  taken  from  [1J.  It  is  seen  from  the  Table  that 
complex-numbered  matrices  require  about  2 1/2  to  4 times  as  -long  for 
inversion  as  do  real-numbered  matrices. 

A number  of  assumptions  have  gone  into  the  derivation  of  the  above 
inversion  times.  These  are: 

1.  The  data  corresponding  to  the  matrix  columns  are  available  in 
auxiliary  memory  when  and  where  desired, 

2.  Software  floating  point  arithmetic  is  used, 

3.  The  real  and  imaginary  parts  of  a matrix  element  each  require 
32-bits,  and 

4.  An  AP  word  is  256  bits  long. 

Assumption  1 simply  means  that  we  have  separated  the  data  management 
problem  from  the  realm  of  the  inversion  process.  That  is  to  say,  the 
inversion  times  in  Tables  3 through  6 do  not  include  the  clock  time  that 
would  be  spent  in  locating  and  accessing  the  desired  data  on  auxiliary 
storage.  A relaxation  of  this  assumption  may,  in  general,  yield  higher 


TABLE  6:  Comparative  Inversion  Times  for  Matrices  with 


Complex  Numbers  and  Matrices  with  Real  Numbers. 


4-  Array 

16-Array 

ST ARAN 

STARAN 

Complex 

Real 

Complex 

Real 

N 

Numbers  1 

lumbers 

Numbers 

Numbers 

500 

10 

4 

7 

3 

1000 

76 

19 

31 

14 

1500 

- 

86 

86 

35 

2000 

- 

152 

154 

62 

Minutes 

Minutes 

^0 


matrix  inversion  times  than  those  computed;  however,  it  may  be  possible 
to  overlap  the  locating  and  accessing  of  data  with  the  AP  processing 
operations.  The  last  observation  remains  largely  untested. 

On  the  other  hand,  as  indicated  in  assumption  2,  the  computed 
inversion  times  reflect  the  software  arithmetic  that  is  being  presently 
utilized  on  the  STARAN.  The  Goodyear  Aerospace  Corporation  has,  for 
some  time  now,  been  considering  hardware  floating  point  arithmetic  for 
the  STARAN.  Their  estimated  operation  times  for  32-bit-f ield-to-f ield 
add,  subtract,  multiply  and  divide  are  40  usees.,  40  usees.,  42  usees,  and 
110  usees,  respectively.  The  same  operations,  when  performed  common- 
register-to-f ield  are  estimated  to  require  29  usees.,  29  usees.,  29  usees, 
and  97  usees,  respectively  [1],  A comparison  of  these  figures  against 
those  in  Table  2 immediately  indicates  that  the  computed  inversion  times 
would  be  smaller  if  hardware,  rather  than  software,  arithmetic  were 
available  on  the  AP. 

Assumptions  3 and  4 were  made  primarily  to  obtain  the  numerical 
timing  data  for  the  inversion  process.  The  importance  of  these  assumptions 
lies  only  in  the  fact  that  a minimum  of  seven  working  fields  are  required 
on  the  AP  for  efficiently  performing  complex-arithmetic  operations  in 
parallel  (this  should  be  clear  from  algorithms  1 and  2),  as  the  assumptions 
allow;  otherwise  they  are  not  crucial  to  our  analysis  at  all. 
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It  should  also  be  mentioned  here  that  it  seems  likely  that  techniques 
other  than  a full  matrix  inversion  can  be  found  that  will  reduce  the 
computation  time  for  solving  systems  of  the  form  AX  = B on  an  AP.  This 
should  especially  be  true  if  the  A matrix  displays  some  special 
characteristics,  e.g.,  if  it  were  sparse  or  diagonally  dominant,  etc. 

The  computed  inversion  times  consider  only  the  Gauss-elimination  method 
and  do  not  take  into  account  any  special  characteristics  of  the  A matrix; 
they  should  consequently  be  taken  as  upper  bounds  on  the  inversion 
process  carried  out  on  an  AP. 

We  are  thus  led  to  the  conclusion  that  the  results  obtained  thus 
far  are  promising  and  that  it  is  reasonable  to  continue  our  investigations 
into  the  solution  of  these  types  of  problems  using  STARAN  like  architect- 
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